Time-Series Classification in Many Intrinsic Dimensions

نویسندگان

  • Milos Radovanovic
  • Alexandros Nanopoulos
  • Mirjana Ivanovic
چکیده

In the context of many data mining tasks, high dimensionality was shown to be able to pose significant problems, commonly referred to as different aspects of the curse of dimensionality. In this paper, we investigate in the time-series domain one aspect of the dimensionality curse called hubness, which refers to the tendency of some instances in a data set to become hubs by being included in unexpectedly many k-nearest neighbor lists of other instances. Through empirical measurements on a large collection of time-series data sets we demonstrate that the hubness phenomenon is caused by high intrinsic dimensionality of time-series data, and shed light on the mechanism through which hubs emerge, focusing on the popular and successful dynamic time warping (DTW) distance. Also, the interaction between hubness and the information provided by class labels is investigated, by considering label matches and mismatches between neighboring time series. Following our findings we formulate a framework for categorizing time-series data sets based on measurements that reflect hubness and the diversity of class labels among nearest neighbors. The framework allows one to assess whether hubness can be successfully used to improve the performance of k-NN classification. Finally, the merits of the framework are demonstrated through experimental evaluation of 1-NN and k-NN classifiers, including a proposed weighting scheme that is designed to make use of hubness information. Our experimental results show that the examined framework, in the majority of cases, is able to correctly reflect the circumstances in which hubness information can effectively be employed in k-NN time-series classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On The Behavior of Malaysian Equities: Fractal Analysis Approach

Fractal analyzing of continuous processes have recently emerged in literatures in various domains. Existence of long memory in many processes including financial time series have been evidenced via different methodologies in many literatures in past decade, which has inspired many recent literatures on quantifying the fractional Brownian motion (fBm) characteristics of financial time series. Th...

متن کامل

Classification of Iranian Contemporary Architecture, Based on Trends and Challenges

The use of demands such as "Iranian-Islamic architecture" or "preservation of Iranian-Islamic identities" appeared in different dimensions and have gradually caused the shape of contemporary Iranian architecture. Many criticisms have been made from various perspectives on the architectural conditions, despite, all of them are worthy of attention, it seems that a required issue has been neglecte...

متن کامل

The Major Determinants of Sustainable Development in Selected Pacific, East and West Asian Countries

Sustainable development is a Controversial concept which has been considered over the three decades. It is comprehensive development and includes all of dimensions as “economic’’, “social’’ and ‘‘environmental’’. In economic objective, it requires substantial economic change that can be brought about by investment and trade. They are effective factors of sustainable developm...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Fitting of Count Time Series Models on the Number of Patients Referred to Addiction Treatment Centers in Semnan County

Abstract. Count data over time are observed in many application areas. Many researchers use time series patterns to analyze this data. In this paper, the poisson count time series linear models and negative binomials on this type of data with the explanatory variables are studied. The Likelihood analysis and the evaluation of count time series model based on generalized linear models are pres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010